Overview

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells420424
Missing cells (%)7.8%7.9%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Fare is highly overall correlated with PclassAlert not present in this datasetHigh Correlation
Survived is highly overall correlated with SexSurvived is highly overall correlated with SexHigh Correlation
Pclass is highly overall correlated with FareAlert not present in this datasetHigh Correlation
Sex is highly overall correlated with SurvivedSex is highly overall correlated with SurvivedHigh Correlation
Age has 82 (18.4%) missing values Age has 81 (18.2%) missing values Missing
Cabin has 337 (75.6%) missing values Cabin has 342 (76.7%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 292 (65.5%) zeros SibSp has 303 (67.9%) zeros Zeros
Parch has 338 (75.8%) zeros Parch has 332 (74.4%) zeros Zeros
Fare has 9 (2.0%) zeros Fare has 9 (2.0%) zeros Zeros

Reproduction

 Dataset ADataset B
Analysis started2023-10-24 11:38:52.0068512023-10-24 11:38:59.504695
Analysis finished2023-10-24 11:38:59.5023692023-10-24 11:39:06.316978
Duration7.5 seconds6.81 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean445.52018438.15471
 Dataset ADataset B
Minimum11
Maximum891891
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-10-24T11:39:06.662709image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum11
5-th percentile3748
Q1219216.25
median448419
Q3663.75662.25
95-th percentile845.75851.75
Maximum891891
Range890890
Interquartile range (IQR)444.75446

Descriptive statistics

 Dataset ADataset B
Standard deviation258.20118258.77
Coefficient of variation (CV)0.579549910.59059047
Kurtosis-1.1891536-1.1980558
Mean445.52018438.15471
Median Absolute Deviation (MAD)221.5218.5
Skewness-0.00585090450.071107097
Sum198702195417
Variance66667.8566961.911
MonotonicityNot monotonicNot monotonic
2023-10-24T11:39:07.110974image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
797 1
 
0.2%
109 1
 
0.2%
197 1
 
0.2%
244 1
 
0.2%
674 1
 
0.2%
523 1
 
0.2%
605 1
 
0.2%
74 1
 
0.2%
649 1
 
0.2%
248 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
672 1
 
0.2%
98 1
 
0.2%
633 1
 
0.2%
652 1
 
0.2%
311 1
 
0.2%
862 1
 
0.2%
80 1
 
0.2%
226 1
 
0.2%
343 1
 
0.2%
254 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
7 1
0.2%
8 1
0.2%
10 1
0.2%
13 1
0.2%
14 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
3 1
0.2%
6 1
0.2%
8 1
0.2%
10 1
0.2%
12 1
0.2%
13 1
0.2%
15 1
0.2%
17 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
3 1
0.2%
6 1
0.2%
8 1
0.2%
10 1
0.2%
12 1
0.2%
13 1
0.2%
15 1
0.2%
17 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
7 1
0.2%
8 1
0.2%
10 1
0.2%
13 1
0.2%
14 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
269 
1
177 
0
285 
1
161 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row10
2nd row11
3rd row10
4th row01
5th row01

Common Values

ValueCountFrequency (%)
0 269
60.3%
1 177
39.7%
ValueCountFrequency (%)
0 285
63.9%
1 161
36.1%

Length

2023-10-24T11:39:07.463699image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-10-24T11:39:07.775421image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-24T11:39:08.001095image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
0 269
60.3%
1 177
39.7%
ValueCountFrequency (%)
0 285
63.9%
1 161
36.1%

Most occurring characters

ValueCountFrequency (%)
0 269
60.3%
1 177
39.7%
ValueCountFrequency (%)
0 285
63.9%
1 161
36.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 446
100.0%
ValueCountFrequency (%)
Decimal Number 446
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 269
60.3%
1 177
39.7%
ValueCountFrequency (%)
0 285
63.9%
1 161
36.1%

Most occurring scripts

ValueCountFrequency (%)
Common 446
100.0%
ValueCountFrequency (%)
Common 446
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 269
60.3%
1 177
39.7%
ValueCountFrequency (%)
0 285
63.9%
1 161
36.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 269
60.3%
1 177
39.7%
ValueCountFrequency (%)
0 285
63.9%
1 161
36.1%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
240 
1
116 
2
90 
3
236 
1
110 
2
100 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row11
2nd row32
3rd row12
4th row21
5th row23

Common Values

ValueCountFrequency (%)
3 240
53.8%
1 116
26.0%
2 90
 
20.2%
ValueCountFrequency (%)
3 236
52.9%
1 110
24.7%
2 100
22.4%

Length

2023-10-24T11:39:08.248352image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-10-24T11:39:08.496251image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-24T11:39:08.720075image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
3 240
53.8%
1 116
26.0%
2 90
 
20.2%
ValueCountFrequency (%)
3 236
52.9%
1 110
24.7%
2 100
22.4%

Most occurring characters

ValueCountFrequency (%)
3 240
53.8%
1 116
26.0%
2 90
 
20.2%
ValueCountFrequency (%)
3 236
52.9%
1 110
24.7%
2 100
22.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 446
100.0%
ValueCountFrequency (%)
Decimal Number 446
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 240
53.8%
1 116
26.0%
2 90
 
20.2%
ValueCountFrequency (%)
3 236
52.9%
1 110
24.7%
2 100
22.4%

Most occurring scripts

ValueCountFrequency (%)
Common 446
100.0%
ValueCountFrequency (%)
Common 446
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 240
53.8%
1 116
26.0%
2 90
 
20.2%
ValueCountFrequency (%)
3 236
52.9%
1 110
24.7%
2 100
22.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 240
53.8%
1 116
26.0%
2 90
 
20.2%
ValueCountFrequency (%)
3 236
52.9%
1 110
24.7%
2 100
22.4%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-10-24T11:39:09.486650image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

 Dataset ADataset B
Max length8282
Median length5150
Mean length27.19282526.829596
Min length1313

Characters and Unicode

 Dataset ADataset B
Total characters1212811966
Distinct characters5960
Distinct categories77 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowLeader, Dr. Alice (Farnham)Davidson, Mr. Thornton
2nd rowConnolly, Miss. KateRichards, Master. William Rowe
3rd rowDick, Mr. Albert AdrianNicholls, Mr. Joseph Charles
4th rowPernot, Mr. ReneSilvey, Mrs. William Baird (Alice Munger)
5th rowCollander, Mr. Erik GustafAsplund, Master. Edvin Rojj Felix
ValueCountFrequency (%)
mr 266
 
14.5%
miss 84
 
4.6%
mrs 72
 
3.9%
william 36
 
2.0%
john 21
 
1.1%
master 17
 
0.9%
henry 16
 
0.9%
george 14
 
0.8%
charles 12
 
0.7%
mary 12
 
0.7%
Other values (887) 1283
70.0%
ValueCountFrequency (%)
mr 260
 
14.4%
miss 92
 
5.1%
mrs 61
 
3.4%
william 32
 
1.8%
john 21
 
1.2%
master 20
 
1.1%
henry 14
 
0.8%
charles 14
 
0.8%
thomas 13
 
0.7%
george 13
 
0.7%
Other values (872) 1261
70.0%
2023-10-24T11:39:11.033456image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1388
 
11.4%
r 989
 
8.2%
e 861
 
7.1%
a 813
 
6.7%
n 661
 
5.5%
i 660
 
5.4%
s 635
 
5.2%
M 574
 
4.7%
l 538
 
4.4%
o 508
 
4.2%
Other values (49) 4501
37.1%
ValueCountFrequency (%)
1356
 
11.3%
r 965
 
8.1%
e 876
 
7.3%
a 834
 
7.0%
n 662
 
5.5%
s 654
 
5.5%
i 644
 
5.4%
M 550
 
4.6%
l 517
 
4.3%
o 502
 
4.2%
Other values (50) 4406
36.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 7788
64.2%
Uppercase Letter 1840
 
15.2%
Space Separator 1388
 
11.4%
Other Punctuation 945
 
7.8%
Close Punctuation 81
 
0.7%
Open Punctuation 81
 
0.7%
Dash Punctuation 5
 
< 0.1%
ValueCountFrequency (%)
Lowercase Letter 7714
64.5%
Uppercase Letter 1806
 
15.1%
Space Separator 1356
 
11.3%
Other Punctuation 943
 
7.9%
Open Punctuation 70
 
0.6%
Close Punctuation 70
 
0.6%
Dash Punctuation 7
 
0.1%

Most frequent character per category

Space Separator
ValueCountFrequency (%)
1388
100.0%
ValueCountFrequency (%)
1356
100.0%
Lowercase Letter
ValueCountFrequency (%)
r 989
12.7%
e 861
11.1%
a 813
10.4%
n 661
8.5%
i 660
8.5%
s 635
8.2%
l 538
 
6.9%
o 508
 
6.5%
t 323
 
4.1%
h 269
 
3.5%
Other values (16) 1531
19.7%
ValueCountFrequency (%)
r 965
12.5%
e 876
11.4%
a 834
10.8%
n 662
8.6%
s 654
8.5%
i 644
8.3%
l 517
 
6.7%
o 502
 
6.5%
t 323
 
4.2%
d 253
 
3.3%
Other values (16) 1484
19.2%
Uppercase Letter
ValueCountFrequency (%)
M 574
31.2%
A 130
 
7.1%
J 119
 
6.5%
H 97
 
5.3%
C 89
 
4.8%
S 87
 
4.7%
E 80
 
4.3%
W 79
 
4.3%
B 69
 
3.8%
L 63
 
3.4%
Other values (15) 453
24.6%
ValueCountFrequency (%)
M 550
30.5%
A 122
 
6.8%
H 102
 
5.6%
J 102
 
5.6%
C 83
 
4.6%
S 82
 
4.5%
E 78
 
4.3%
B 75
 
4.2%
W 67
 
3.7%
G 65
 
3.6%
Other values (15) 480
26.6%
Other Punctuation
ValueCountFrequency (%)
. 446
47.2%
, 446
47.2%
" 50
 
5.3%
' 3
 
0.3%
ValueCountFrequency (%)
. 446
47.3%
, 446
47.3%
" 46
 
4.9%
' 4
 
0.4%
/ 1
 
0.1%
Close Punctuation
ValueCountFrequency (%)
) 81
100.0%
ValueCountFrequency (%)
) 70
100.0%
Open Punctuation
ValueCountFrequency (%)
( 81
100.0%
ValueCountFrequency (%)
( 70
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 5
100.0%
ValueCountFrequency (%)
- 7
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 9628
79.4%
Common 2500
 
20.6%
ValueCountFrequency (%)
Latin 9520
79.6%
Common 2446
 
20.4%

Most frequent character per script

Common
ValueCountFrequency (%)
1388
55.5%
. 446
 
17.8%
, 446
 
17.8%
) 81
 
3.2%
( 81
 
3.2%
" 50
 
2.0%
- 5
 
0.2%
' 3
 
0.1%
ValueCountFrequency (%)
1356
55.4%
. 446
 
18.2%
, 446
 
18.2%
( 70
 
2.9%
) 70
 
2.9%
" 46
 
1.9%
- 7
 
0.3%
' 4
 
0.2%
/ 1
 
< 0.1%
Latin
ValueCountFrequency (%)
r 989
 
10.3%
e 861
 
8.9%
a 813
 
8.4%
n 661
 
6.9%
i 660
 
6.9%
s 635
 
6.6%
M 574
 
6.0%
l 538
 
5.6%
o 508
 
5.3%
t 323
 
3.4%
Other values (41) 3066
31.8%
ValueCountFrequency (%)
r 965
 
10.1%
e 876
 
9.2%
a 834
 
8.8%
n 662
 
7.0%
s 654
 
6.9%
i 644
 
6.8%
M 550
 
5.8%
l 517
 
5.4%
o 502
 
5.3%
t 323
 
3.4%
Other values (41) 2993
31.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 12128
100.0%
ValueCountFrequency (%)
ASCII 11966
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1388
 
11.4%
r 989
 
8.2%
e 861
 
7.1%
a 813
 
6.7%
n 661
 
5.5%
i 660
 
5.4%
s 635
 
5.2%
M 574
 
4.7%
l 538
 
4.4%
o 508
 
4.2%
Other values (49) 4501
37.1%
ValueCountFrequency (%)
1356
 
11.3%
r 965
 
8.1%
e 876
 
7.3%
a 834
 
7.0%
n 662
 
5.5%
s 654
 
5.5%
i 644
 
5.4%
M 550
 
4.6%
l 517
 
4.3%
o 502
 
4.2%
Other values (50) 4406
36.8%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
288 
female
158 
male
291 
female
155 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.70852024.6950673
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters21002094
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowfemalemale
2nd rowfemalemale
3rd rowmalemale
4th rowmalefemale
5th rowmalemale

Common Values

ValueCountFrequency (%)
male 288
64.6%
female 158
35.4%
ValueCountFrequency (%)
male 291
65.2%
female 155
34.8%

Length

2023-10-24T11:39:11.389368image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-10-24T11:39:11.644760image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-24T11:39:11.858469image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
male 288
64.6%
female 158
35.4%
ValueCountFrequency (%)
male 291
65.2%
female 155
34.8%

Most occurring characters

ValueCountFrequency (%)
e 604
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 158
 
7.5%
ValueCountFrequency (%)
e 601
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 155
 
7.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2100
100.0%
ValueCountFrequency (%)
Lowercase Letter 2094
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 604
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 158
 
7.5%
ValueCountFrequency (%)
e 601
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 155
 
7.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 2100
100.0%
ValueCountFrequency (%)
Latin 2094
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 604
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 158
 
7.5%
ValueCountFrequency (%)
e 601
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 155
 
7.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2100
100.0%
ValueCountFrequency (%)
ASCII 2094
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 604
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 158
 
7.5%
ValueCountFrequency (%)
e 601
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 155
 
7.4%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7672
Distinct (%)20.9%19.7%
Missing8281
Missing (%)18.4%18.2%
Infinite00
Infinite (%)0.0%0.0%
Mean30.59662129.834712
 Dataset ADataset B
Minimum0.670.42
Maximum8074
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-10-24T11:39:12.253273image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.670.42
5-th percentile4.155.2
Q12121
median2929
Q33938
95-th percentile59.8554.8
Maximum8074
Range79.3373.58
Interquartile range (IQR)1817

Descriptive statistics

 Dataset ADataset B
Standard deviation14.60600813.679396
Coefficient of variation (CV)0.477373250.45850604
Kurtosis0.327453850.053951798
Mean30.59662129.834712
Median Absolute Deviation (MAD)98
Skewness0.447000220.21634065
Sum11137.1710889.67
Variance213.33548187.12587
MonotonicityNot monotonicNot monotonic
2023-10-24T11:39:12.760694image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
22 15
 
3.4%
28 14
 
3.1%
36 13
 
2.9%
27 13
 
2.9%
24 13
 
2.9%
31 12
 
2.7%
19 12
 
2.7%
25 12
 
2.7%
18 11
 
2.5%
26 10
 
2.2%
Other values (66) 239
53.6%
(Missing) 82
 
18.4%
ValueCountFrequency (%)
30 16
 
3.6%
36 14
 
3.1%
24 13
 
2.9%
31 12
 
2.7%
28 12
 
2.7%
21 12
 
2.7%
32 11
 
2.5%
22 11
 
2.5%
25 11
 
2.5%
23 10
 
2.2%
Other values (62) 243
54.5%
(Missing) 81
 
18.2%
ValueCountFrequency (%)
0.67 1
 
0.2%
0.75 1
 
0.2%
0.83 1
 
0.2%
0.92 1
 
0.2%
1 3
0.7%
2 3
0.7%
3 3
0.7%
4 6
1.3%
5 1
 
0.2%
7 2
 
0.4%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 1
 
0.2%
1 5
1.1%
2 4
0.9%
3 2
 
0.4%
4 5
1.1%
5 1
 
0.2%
6 1
 
0.2%
7 3
0.7%
8 1
 
0.2%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 1
 
0.2%
1 5
1.1%
2 4
0.9%
3 2
 
0.4%
4 5
1.1%
5 1
 
0.2%
6 1
 
0.2%
7 3
0.7%
8 1
 
0.2%
ValueCountFrequency (%)
0.67 1
 
0.2%
0.75 1
 
0.2%
0.83 1
 
0.2%
0.92 1
 
0.2%
1 3
0.7%
2 3
0.7%
3 3
0.7%
4 6
1.3%
5 1
 
0.2%
7 2
 
0.4%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.529147980.51793722
 Dataset ADataset B
Minimum00
Maximum88
Zeros292303
Zeros (%)65.5%67.9%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-10-24T11:39:13.118364image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile23
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation1.05054411.04651
Coefficient of variation (CV)1.98535032.0205344
Kurtosis19.20866515.082131
Mean0.529147980.51793722
Median Absolute Deviation (MAD)00
Skewness3.74255953.3556234
Sum236231
Variance1.10364291.0951832
MonotonicityNot monotonicNot monotonic
2023-10-24T11:39:13.416301image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 292
65.5%
1 119
26.7%
2 16
 
3.6%
4 7
 
1.6%
3 6
 
1.3%
5 3
 
0.7%
8 3
 
0.7%
ValueCountFrequency (%)
0 303
67.9%
1 105
 
23.5%
2 14
 
3.1%
3 10
 
2.2%
4 8
 
1.8%
5 4
 
0.9%
8 2
 
0.4%
ValueCountFrequency (%)
0 292
65.5%
1 119
26.7%
2 16
 
3.6%
3 6
 
1.3%
4 7
 
1.6%
5 3
 
0.7%
8 3
 
0.7%
ValueCountFrequency (%)
0 303
67.9%
1 105
 
23.5%
2 14
 
3.1%
3 10
 
2.2%
4 8
 
1.8%
5 4
 
0.9%
8 2
 
0.4%
ValueCountFrequency (%)
0 303
67.9%
1 105
 
23.5%
2 14
 
3.1%
3 10
 
2.2%
4 8
 
1.8%
5 4
 
0.9%
8 2
 
0.4%
ValueCountFrequency (%)
0 292
65.5%
1 119
26.7%
2 16
 
3.6%
3 6
 
1.3%
4 7
 
1.6%
5 3
 
0.7%
8 3
 
0.7%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.387892380.41928251
 Dataset ADataset B
Minimum00
Maximum66
Zeros338332
Zeros (%)75.8%74.4%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-10-24T11:39:13.691793image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q301
95-th percentile22
Maximum66
Range66
Interquartile range (IQR)01

Descriptive statistics

 Dataset ADataset B
Standard deviation0.834501290.84878521
Coefficient of variation (CV)2.15137332.0243754
Kurtosis11.4885179.0163041
Mean0.387892380.41928251
Median Absolute Deviation (MAD)00
Skewness2.96896342.6241397
Sum173187
Variance0.69639240.72043634
MonotonicityNot monotonicNot monotonic
2023-10-24T11:39:13.962687image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 338
75.8%
1 63
 
14.1%
2 36
 
8.1%
3 3
 
0.7%
5 3
 
0.7%
4 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 332
74.4%
1 59
 
13.2%
2 47
 
10.5%
4 3
 
0.7%
5 2
 
0.4%
3 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 338
75.8%
1 63
 
14.1%
2 36
 
8.1%
3 3
 
0.7%
4 2
 
0.4%
5 3
 
0.7%
6 1
 
0.2%
ValueCountFrequency (%)
0 332
74.4%
1 59
 
13.2%
2 47
 
10.5%
3 2
 
0.4%
4 3
 
0.7%
5 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 332
74.4%
1 59
 
13.2%
2 47
 
10.5%
3 2
 
0.4%
4 3
 
0.7%
5 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 338
75.8%
1 63
 
14.1%
2 36
 
8.1%
3 3
 
0.7%
4 2
 
0.4%
5 3
 
0.7%
6 1
 
0.2%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct381377
Distinct (%)85.4%84.5%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-10-24T11:39:14.949753image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.68609876.7713004
Min length33

Characters and Unicode

 Dataset ADataset B
Total characters29823020
Distinct characters3532
Distinct categories55 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique329325 ?
Unique (%)73.8%72.9%

Sample

 Dataset ADataset B
1st row17465F.C. 12750
2nd row37037329106
3rd row17474C.A. 33112
4th rowSC/PARIS 213113507
5th row248740347077
ValueCountFrequency (%)
pc 33
 
5.8%
c.a 11
 
1.9%
a/5 9
 
1.6%
ston/o 8
 
1.4%
2 8
 
1.4%
ca 7
 
1.2%
w./c 5
 
0.9%
sc/paris 5
 
0.9%
line 4
 
0.7%
2144 4
 
0.7%
Other values (398) 471
83.4%
ValueCountFrequency (%)
pc 27
 
4.8%
ca 7
 
1.2%
c.a 7
 
1.2%
2 7
 
1.2%
ston/o 7
 
1.2%
w./c 6
 
1.1%
ston/o2 5
 
0.9%
a/5 5
 
0.9%
2144 5
 
0.9%
soton/o.q 5
 
0.9%
Other values (401) 480
85.6%
2023-10-24T11:39:16.366946image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 379
12.7%
1 334
11.2%
2 280
9.4%
7 256
8.6%
4 235
 
7.9%
6 208
 
7.0%
0 207
 
6.9%
5 201
 
6.7%
9 157
 
5.3%
8 137
 
4.6%
Other values (25) 588
19.7%
ValueCountFrequency (%)
3 356
11.8%
1 349
11.6%
2 302
10.0%
7 245
8.1%
4 241
 
8.0%
6 216
 
7.2%
5 196
 
6.5%
0 193
 
6.4%
9 170
 
5.6%
8 145
 
4.8%
Other values (22) 607
20.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2394
80.3%
Uppercase Letter 320
 
10.7%
Other Punctuation 136
 
4.6%
Space Separator 119
 
4.0%
Lowercase Letter 13
 
0.4%
ValueCountFrequency (%)
Decimal Number 2413
79.9%
Uppercase Letter 329
 
10.9%
Other Punctuation 155
 
5.1%
Space Separator 115
 
3.8%
Lowercase Letter 8
 
0.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 379
15.8%
1 334
14.0%
2 280
11.7%
7 256
10.7%
4 235
9.8%
6 208
8.7%
0 207
8.6%
5 201
8.4%
9 157
6.6%
8 137
 
5.7%
ValueCountFrequency (%)
3 356
14.8%
1 349
14.5%
2 302
12.5%
7 245
10.2%
4 241
10.0%
6 216
9.0%
5 196
8.1%
0 193
8.0%
9 170
7.0%
8 145
6.0%
Space Separator
ValueCountFrequency (%)
119
100.0%
ValueCountFrequency (%)
115
100.0%
Other Punctuation
ValueCountFrequency (%)
. 88
64.7%
/ 48
35.3%
ValueCountFrequency (%)
. 104
67.1%
/ 51
32.9%
Uppercase Letter
ValueCountFrequency (%)
C 76
23.8%
P 50
15.6%
A 41
12.8%
O 40
12.5%
S 36
11.2%
N 19
 
5.9%
T 15
 
4.7%
I 9
 
2.8%
W 7
 
2.2%
Q 6
 
1.9%
Other values (6) 21
 
6.6%
ValueCountFrequency (%)
C 72
21.9%
O 55
16.7%
P 47
14.3%
S 43
13.1%
A 34
10.3%
N 22
 
6.7%
T 20
 
6.1%
W 10
 
3.0%
I 6
 
1.8%
Q 6
 
1.8%
Other values (5) 14
 
4.3%
Lowercase Letter
ValueCountFrequency (%)
a 4
30.8%
s 3
23.1%
r 2
15.4%
i 2
15.4%
l 1
 
7.7%
e 1
 
7.7%
ValueCountFrequency (%)
a 2
25.0%
r 2
25.0%
i 2
25.0%
s 2
25.0%

Most occurring scripts

ValueCountFrequency (%)
Common 2649
88.8%
Latin 333
 
11.2%
ValueCountFrequency (%)
Common 2683
88.8%
Latin 337
 
11.2%

Most frequent character per script

Common
ValueCountFrequency (%)
3 379
14.3%
1 334
12.6%
2 280
10.6%
7 256
9.7%
4 235
8.9%
6 208
7.9%
0 207
7.8%
5 201
7.6%
9 157
5.9%
8 137
 
5.2%
Other values (3) 255
9.6%
ValueCountFrequency (%)
3 356
13.3%
1 349
13.0%
2 302
11.3%
7 245
9.1%
4 241
9.0%
6 216
8.1%
5 196
7.3%
0 193
7.2%
9 170
6.3%
8 145
5.4%
Other values (3) 270
10.1%
Latin
ValueCountFrequency (%)
C 76
22.8%
P 50
15.0%
A 41
12.3%
O 40
12.0%
S 36
10.8%
N 19
 
5.7%
T 15
 
4.5%
I 9
 
2.7%
W 7
 
2.1%
Q 6
 
1.8%
Other values (12) 34
10.2%
ValueCountFrequency (%)
C 72
21.4%
O 55
16.3%
P 47
13.9%
S 43
12.8%
A 34
10.1%
N 22
 
6.5%
T 20
 
5.9%
W 10
 
3.0%
I 6
 
1.8%
Q 6
 
1.8%
Other values (9) 22
 
6.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2982
100.0%
ValueCountFrequency (%)
ASCII 3020
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 379
12.7%
1 334
11.2%
2 280
9.4%
7 256
8.6%
4 235
 
7.9%
6 208
 
7.0%
0 207
 
6.9%
5 201
 
6.7%
9 157
 
5.3%
8 137
 
4.6%
Other values (25) 588
19.7%
ValueCountFrequency (%)
3 356
11.8%
1 349
11.6%
2 302
10.0%
7 245
8.1%
4 241
 
8.0%
6 216
 
7.2%
5 196
 
6.5%
0 193
 
6.4%
9 170
 
5.6%
8 145
 
4.8%
Other values (22) 607
20.1%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct183177
Distinct (%)41.0%39.7%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean33.98744333.034996
 Dataset ADataset B
Minimum00
Maximum512.3292512.3292
Zeros99
Zeros (%)2.0%2.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-10-24T11:39:16.792794image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.2257.162525
Q17.9257.95625
median14.514.47915
Q331.387530.5
95-th percentile130.2375113.275
Maximum512.3292512.3292
Range512.3292512.3292
Interquartile range (IQR)23.462522.54375

Descriptive statistics

 Dataset ADataset B
Standard deviation50.54016753.440603
Coefficient of variation (CV)1.48702471.6176967
Kurtosis23.25442732.793948
Mean33.98744333.034996
Median Absolute Deviation (MAD)7.218757.10625
Skewness3.97246264.8980006
Sum15158.414733.608
Variance2554.30842855.898
MonotonicityNot monotonicNot monotonic
2023-10-24T11:39:17.259710image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.05 22
 
4.9%
7.75 19
 
4.3%
13 19
 
4.3%
26 18
 
4.0%
7.8958 17
 
3.8%
10.5 11
 
2.5%
7.925 10
 
2.2%
8.6625 9
 
2.0%
0 9
 
2.0%
7.775 9
 
2.0%
Other values (173) 303
67.9%
ValueCountFrequency (%)
13 29
 
6.5%
8.05 22
 
4.9%
7.8958 18
 
4.0%
7.75 16
 
3.6%
26 16
 
3.6%
7.925 10
 
2.2%
7.775 9
 
2.0%
0 9
 
2.0%
26.55 7
 
1.6%
10.5 7
 
1.6%
Other values (167) 303
67.9%
ValueCountFrequency (%)
0 9
2.0%
5 1
 
0.2%
6.2375 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%
7.0458 1
 
0.2%
7.05 2
 
0.4%
7.0542 2
 
0.4%
ValueCountFrequency (%)
0 9
2.0%
5 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.975 2
 
0.4%
7.0458 1
 
0.2%
7.05 3
 
0.7%
7.0542 1
 
0.2%
7.125 2
 
0.4%
ValueCountFrequency (%)
0 9
2.0%
5 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.975 2
 
0.4%
7.0458 1
 
0.2%
7.05 3
 
0.7%
7.0542 1
 
0.2%
7.125 2
 
0.4%
ValueCountFrequency (%)
0 9
2.0%
5 1
 
0.2%
6.2375 1
 
0.2%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%
7.0458 1
 
0.2%
7.05 2
 
0.4%
7.0542 2
 
0.4%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct9189
Distinct (%)83.5%85.6%
Missing337342
Missing (%)75.6%76.7%
Memory size7.0 KiB7.0 KiB
2023-10-24T11:39:18.145308image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1515
Median length33
Mean length3.59633033.8365385
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters392399
Distinct characters1918
Distinct categories33 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique7576 ?
Unique (%)68.8%73.1%

Sample

 Dataset ADataset B
1st rowD17B71
2nd rowB20E44
3rd rowD48C23 C25 C27
4th rowF33B77
5th rowE46F E69
ValueCountFrequency (%)
c23 3
 
2.3%
c27 3
 
2.3%
d 3
 
2.3%
c25 3
 
2.3%
e24 2
 
1.6%
c65 2
 
1.6%
c68 2
 
1.6%
c2 2
 
1.6%
b55 2
 
1.6%
b53 2
 
1.6%
Other values (93) 105
81.4%
ValueCountFrequency (%)
c23 4
 
3.1%
c27 4
 
3.1%
c25 4
 
3.1%
d 2
 
1.6%
f 2
 
1.6%
c124 2
 
1.6%
b77 2
 
1.6%
b55 2
 
1.6%
b53 2
 
1.6%
b18 2
 
1.6%
Other values (91) 101
79.5%
2023-10-24T11:39:19.305391image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 38
 
9.7%
C 37
 
9.4%
3 34
 
8.7%
B 34
 
8.7%
1 33
 
8.4%
5 28
 
7.1%
6 26
 
6.6%
E 23
 
5.9%
4 20
 
5.1%
20
 
5.1%
Other values (9) 99
25.3%
ValueCountFrequency (%)
C 42
 
10.5%
2 36
 
9.0%
1 34
 
8.5%
B 34
 
8.5%
5 32
 
8.0%
3 27
 
6.8%
6 24
 
6.0%
23
 
5.8%
4 22
 
5.5%
7 20
 
5.0%
Other values (8) 105
26.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 243
62.0%
Uppercase Letter 129
32.9%
Space Separator 20
 
5.1%
ValueCountFrequency (%)
Decimal Number 249
62.4%
Uppercase Letter 127
31.8%
Space Separator 23
 
5.8%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 38
15.6%
3 34
14.0%
1 33
13.6%
5 28
11.5%
6 26
10.7%
4 20
8.2%
8 20
8.2%
9 15
 
6.2%
7 15
 
6.2%
0 14
 
5.8%
ValueCountFrequency (%)
2 36
14.5%
1 34
13.7%
5 32
12.9%
3 27
10.8%
6 24
9.6%
4 22
8.8%
7 20
8.0%
8 19
7.6%
0 19
7.6%
9 16
6.4%
Uppercase Letter
ValueCountFrequency (%)
C 37
28.7%
B 34
26.4%
E 23
17.8%
D 17
13.2%
A 10
 
7.8%
F 5
 
3.9%
G 2
 
1.6%
T 1
 
0.8%
ValueCountFrequency (%)
C 42
33.1%
B 34
26.8%
D 20
15.7%
E 17
13.4%
A 6
 
4.7%
F 5
 
3.9%
G 3
 
2.4%
Space Separator
ValueCountFrequency (%)
20
100.0%
ValueCountFrequency (%)
23
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 263
67.1%
Latin 129
32.9%
ValueCountFrequency (%)
Common 272
68.2%
Latin 127
31.8%

Most frequent character per script

Common
ValueCountFrequency (%)
2 38
14.4%
3 34
12.9%
1 33
12.5%
5 28
10.6%
6 26
9.9%
4 20
7.6%
20
7.6%
8 20
7.6%
9 15
 
5.7%
7 15
 
5.7%
ValueCountFrequency (%)
2 36
13.2%
1 34
12.5%
5 32
11.8%
3 27
9.9%
6 24
8.8%
23
8.5%
4 22
8.1%
7 20
7.4%
8 19
7.0%
0 19
7.0%
Latin
ValueCountFrequency (%)
C 37
28.7%
B 34
26.4%
E 23
17.8%
D 17
13.2%
A 10
 
7.8%
F 5
 
3.9%
G 2
 
1.6%
T 1
 
0.8%
ValueCountFrequency (%)
C 42
33.1%
B 34
26.8%
D 20
15.7%
E 17
13.4%
A 6
 
4.7%
F 5
 
3.9%
G 3
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 392
100.0%
ValueCountFrequency (%)
ASCII 399
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 38
 
9.7%
C 37
 
9.4%
3 34
 
8.7%
B 34
 
8.7%
1 33
 
8.4%
5 28
 
7.1%
6 26
 
6.6%
E 23
 
5.9%
4 20
 
5.1%
20
 
5.1%
Other values (9) 99
25.3%
ValueCountFrequency (%)
C 42
 
10.5%
2 36
 
9.0%
1 34
 
8.5%
B 34
 
8.5%
5 32
 
8.0%
3 27
 
6.8%
6 24
 
6.0%
23
 
5.8%
4 22
 
5.5%
7 20
 
5.0%
Other values (8) 105
26.3%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing11
Missing (%)0.2%0.2%
Memory size7.0 KiB7.0 KiB
S
320 
C
83 
Q
42 
S
316 
C
93 
Q
36 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters445445
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowSS
2nd rowQS
3rd rowSS
4th rowCS
5th rowSS

Common Values

ValueCountFrequency (%)
S 320
71.7%
C 83
 
18.6%
Q 42
 
9.4%
(Missing) 1
 
0.2%
ValueCountFrequency (%)
S 316
70.9%
C 93
 
20.9%
Q 36
 
8.1%
(Missing) 1
 
0.2%

Length

2023-10-24T11:39:19.633350image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-10-24T11:39:19.862862image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-24T11:39:20.097032image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
s 320
71.9%
c 83
 
18.7%
q 42
 
9.4%
ValueCountFrequency (%)
s 316
71.0%
c 93
 
20.9%
q 36
 
8.1%

Most occurring characters

ValueCountFrequency (%)
S 320
71.9%
C 83
 
18.7%
Q 42
 
9.4%
ValueCountFrequency (%)
S 316
71.0%
C 93
 
20.9%
Q 36
 
8.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 445
100.0%
ValueCountFrequency (%)
Uppercase Letter 445
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 320
71.9%
C 83
 
18.7%
Q 42
 
9.4%
ValueCountFrequency (%)
S 316
71.0%
C 93
 
20.9%
Q 36
 
8.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 445
100.0%
ValueCountFrequency (%)
Latin 445
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 320
71.9%
C 83
 
18.7%
Q 42
 
9.4%
ValueCountFrequency (%)
S 316
71.0%
C 93
 
20.9%
Q 36
 
8.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 445
100.0%
ValueCountFrequency (%)
ASCII 445
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S 320
71.9%
C 83
 
18.7%
Q 42
 
9.4%
ValueCountFrequency (%)
S 316
71.0%
C 93
 
20.9%
Q 36
 
8.1%

Interactions

Dataset A

2023-10-24T11:38:57.581884image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-24T11:39:04.445176image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-24T11:38:53.462126image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-24T11:39:00.008049image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-24T11:38:54.564461image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-24T11:39:01.005152image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-24T11:38:55.578673image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-24T11:39:02.170685image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-24T11:38:56.598842image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-24T11:39:03.392672image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-24T11:38:57.759514image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-24T11:39:04.632049image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-24T11:38:53.639400image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-24T11:39:00.177063image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-24T11:38:54.745990image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-24T11:39:01.208234image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-24T11:38:55.767595image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-24T11:39:02.362671image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-24T11:38:56.780467image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-24T11:39:03.577375image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-24T11:38:57.972105image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-24T11:39:04.848050image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-24T11:38:53.843556image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-24T11:39:00.369663image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-24T11:38:54.965881image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-24T11:39:01.505411image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-24T11:38:55.982306image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-24T11:39:02.759166image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-24T11:38:57.002688image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-24T11:39:03.781344image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-24T11:38:58.190801image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-24T11:39:05.065647image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-24T11:38:54.200778image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-24T11:39:00.584968image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-24T11:38:55.153658image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-24T11:39:01.769572image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-24T11:38:56.195983image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-24T11:39:02.978449image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-24T11:38:57.208489image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-24T11:39:04.027163image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-24T11:38:58.390665image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-24T11:39:05.255483image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-24T11:38:54.386677image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-24T11:39:00.800961image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-24T11:38:55.385101image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-24T11:39:01.978452image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-24T11:38:56.409628image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-24T11:39:03.192492image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-24T11:38:57.392527image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-24T11:39:04.239161image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Correlations

Dataset A

2023-10-24T11:39:20.478547image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-24T11:39:20.753844image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

PassengerIdAgeSibSpParchFareSurvivedPclassSexEmbarked
PassengerId1.000-0.004-0.0780.0670.0160.1790.0610.0780.000
Age-0.0041.000-0.113-0.1450.2080.1460.2670.0850.112
SibSp-0.078-0.1131.0000.4280.4500.1730.1310.2210.059
Parch0.067-0.1450.4281.0000.4330.1280.0430.2180.046
Fare0.0160.2080.4500.4331.0000.3010.5040.2290.241
Survived0.1790.1460.1730.1280.3011.0000.3820.6200.141
Pclass0.0610.2670.1310.0430.5040.3821.0000.1760.276
Sex0.0780.0850.2210.2180.2290.6200.1761.0000.199
Embarked0.0000.1120.0590.0460.2410.1410.2760.1991.000

Dataset B

PassengerIdAgeSibSpParchFareSurvivedPclassSexEmbarked
PassengerId1.0000.095-0.0710.037-0.0220.0000.0650.0000.000
Age0.0951.000-0.177-0.2680.0950.0750.2360.0770.000
SibSp-0.071-0.1771.0000.4150.4570.0490.1070.1580.092
Parch0.037-0.2680.4151.0000.4220.0580.0000.2930.076
Fare-0.0220.0950.4570.4221.0000.2840.4620.2560.222
Survived0.0000.0750.0490.0580.2841.0000.3610.5140.227
Pclass0.0650.2360.1070.0000.4620.3611.0000.1190.266
Sex0.0000.0770.1580.2930.2560.5140.1191.0000.137
Embarked0.0000.0000.0920.0760.2220.2270.2660.1371.000

Missing values

Dataset A

2023-10-24T11:38:58.682160image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2023-10-24T11:39:05.530869image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2023-10-24T11:38:59.103787image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2023-10-24T11:39:05.921547image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset A

2023-10-24T11:38:59.382701image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Dataset B

2023-10-24T11:39:06.186178image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
79679711Leader, Dr. Alice (Farnham)female49.0001746525.9292D17S
28929013Connolly, Miss. Katefemale22.0003703737.7500NaNQ
69069111Dick, Mr. Albert Adrianmale31.0101747457.0000B20S
18118202Pernot, Mr. RenemaleNaN00SC/PARIS 213115.0500NaNC
34234302Collander, Mr. Erik Gustafmale28.00024874013.0000NaNS
37637713Landergren, Miss. Aurora Adeliafemale22.000C 70777.2500NaNS
31331403Hendekovic, Mr. Ignjacmale28.0003492437.8958NaNS
65966001Newell, Mr. Arthur Webstermale58.00235273113.2750D48C
51651712Lemore, Mrs. (Amelia Milley)female34.000C.A. 3426010.5000F33S
6701McCarthy, Mr. Timothy Jmale54.0001746351.8625E46S

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
67167201Davidson, Mr. Thorntonmale31.010F.C. 1275052.0000B71S
40740812Richards, Master. William Rowemale3.0112910618.7500NaNS
14514602Nicholls, Mr. Joseph Charlesmale19.011C.A. 3311236.7500NaNS
57757811Silvey, Mrs. William Baird (Alice Munger)female39.0101350755.9000E44S
26126213Asplund, Master. Edvin Rojj Felixmale3.04234707731.3875NaNS
42842903Flynn, Mr. JamesmaleNaN003648517.7500NaNQ
72772813Mannion, Miss. MargarethfemaleNaN00368667.7375NaNQ
43843901Fortune, Mr. Markmale64.01419950263.0000C23 C25 C27S
25725811Cherry, Miss. Gladysfemale30.00011015286.5000B77S
88688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
989912Doling, Mrs. John T (Ada Julia Bone)female34.00123191923.0000NaNS
66666702Butler, Mr. Reginald Fentonmale25.00023468613.0000NaNS
59359403Bourke, Miss. MaryfemaleNaN023648487.7500NaNQ
81281302Slemen, Mr. Richard Jamesmale35.0002820610.5000NaNS
32332412Caldwell, Mrs. Albert Francis (Sylvia Mae Harbaugh)female22.01124873829.0000NaNS
42542603Wiseman, Mr. PhillippemaleNaN00A/4. 342447.2500NaNS
21221303Perkin, Mr. John Henrymale22.000A/5 211747.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Thayer)female38.010PC 1759971.2833C85C
63563612Davis, Miss. Maryfemale28.00023766813.0000NaNS
11611703Connors, Mr. Patrickmale70.5003703697.7500NaNQ

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
121303Saundercock, Mr. William Henrymale20.000A/5. 21518.0500NaNS
75475512Herman, Mrs. Samuel (Jane Laver)female48.01222084565.0000NaNS
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
16816901Baumann, Mr. John DmaleNaN00PC 1731825.9250NaNS
87787803Petroff, Mr. Nedeliomale19.0003492127.8958NaNS
34634712Smith, Miss. Marion Elsiefemale40.0003141813.0000NaNS
34734813Davison, Mrs. Thomas Henry (Mary E Finck)femaleNaN1038652516.1000NaNS
18919003Turcin, Mr. Stjepanmale36.0003492477.8958NaNS
919203Andreasson, Mr. Paul Edvinmale20.0003474667.8542NaNS
54354412Beane, Mr. Edwardmale32.010290826.0000NaNS

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.